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Abstract 

We answer the following question of R. L. Graham: What is the dis- 
crepancy of the lexicographically-least binary de Bruijn sequence? Here, 
"discrepancy" refers to the maximum (absolute) difference between the 
number of ones and the number of zeros in any initial segment of the 
sequence. We show that the answer is 0(2" log n/n). 

1 Introduction 

A binary de Bruijn sequence of order k is a word ai • • • a2fc over the alphabet 
{0, 1} that contains every fc-word exactly once as a subword when the indices 
are interpreted cyclically. It is well known (see, e.g., [6]) that the number of de 
Bruijn cycles of order k is given by 

Among these is the "Ford sequence"^, the remarkable cyclic binary word which 
is 

1. the lexicographic least de Bruijn sequence, 

2. the result of applying the least-first greedy algorithm to constructing a de 
Bruijn sequence (starting with 1*"'), 

3. the result of concatenating all "Lyndon" words (lexicographically mini- 
mal representatives of free conjugacy classes) of each length dividing k in 
lexicographic order, and 

4. the de Bruijn sequence generated by a shift register whose truth table has 
minimum weight. 



^See the excellent survey [3] for a history of this and related sequences. The eponym, due 
to Fredricksen, refers to a 1957 unpublished manuscript of Ford ([2]). However, subsequent 
research has revealed earlier references. In [3], the author proposes that a 1934 paper of 
Martin ([Z]) is the earliest appearance. 
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Since the greedy algorithm uses O's before I's whenever possible, it is natural to 
suspect that this special sequence has an excess of O's early on, i.e, the difference 
between the number of O's and I's in initial segments is large. Indeed, Huang 
comments in [3] that 

The "prefer one" algorithm proposed by Predricksen joins the pure 
cycles of [a] circulating register (CR) in order according to the weights 
of the n-tuples... so some part of the sequence may contain many 
heavily weighted n-tuples and it leads to a bad local 0-1 balance. 

R. L. Graham therefore asks for the maximum "discrepancy." In the present 
note, we show that it has order 2" log n/n. 

Define the equivalence relation ~ ("conjugacy") on binary words by setting 
xy ~ yx for any x,y € {0, 1}*. For a word w G {0, 1}*, define w° to be the 
lexicographic least element of the ^-equivalence class |w] of w. If w is aperiodic 
(i.e., if w ~ xy with x,y ^ e, then w ^ yx), then is called a "Lyndon word." 
Then the lexicographically least binary ordcr-n dc Bruijn sequence £„ consists 
of the concatenation of all Lyndon words of length dividing n, in lexicographic 
order. 

For a word w £ {0, 1}*, write Wk for its fc**^ symbol from left to right, starting 
with zero. Then we define the discrepancy of w to be 



Theorem 1. disc(£„) = e(2"logn/n). 

We conjecture a slightly stronger statement: 

Conjecture 1. There is some C so that lim„^oo ~ ^ ' 

Our argument will estimate the discrepancy of by considering substrings 
consisting of Lyndon words w° grouped by the length k of their 0'"'1 prefix. For 
< fc < n, let Sk be the set of binary words of length n containing the subword 
O*' but not the subword 0*^+^. Then the elements of Sk are precisely those w 
so that w° begins with O'^. Define S^. = {w" : w G Sk}, and let ik be the 
concatenation of the elements of S^. in lexicographic order. Since the elements 
of S*^ precede those of S'^_i in the lexicographic order, this means that 



discfui) = max 

M 




(-1) 




as long as n is prime. 

For a binary string if of length n, we define the skew of w to be 



n-l 



skH-E(-i) 
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so that 

disc(/:„) = ^^iT^ax 2 ^1 + X] sk(4i-fc) + disc(C-t-i)^ 
when n is prime. This will allow us to bound the discrepancy of £„. 



2 Preliminaries 

Define ak{n) to be the number of elements of {0, 1}" containing no subword O'^, 
and let /3fc(n) be defined by 

Pk{n) = ^ sk(w). 
Mie{o,i}" 

For the remainder of this section, we fix a fc > 2. 

Lemma 2. The sequences a„ — ak{n) and 6„ — I3k{n) satisfy: 

1- o,n = Ori-j /or 71 > fc, and 

2. bn = I]j=i[(i - 2)a„_j + 6„_j] /or n>k. 

Furthermore, aj = 2^ for < j < k and bj = for < j < fc. 

Proof. Both recurrences follow from the following consideration: any string of 
length at least fc not containing a subword O*"' has a left-most 1. Therefore, we 
may partition the O'^-free sequences into those which begin with a string of the 
form O-'l for < j < fc. The "base case" formulas trivially follow from the fact 
that every string of length less than fc is O'^-free. □ 

Lemma 3. For n — 1 > k > 3, 

k n—k—1 

a„_i = fc + ^(j - 2)a„„j + (fc - 1) ^ Oj. 

Proof. We proceed by induction. First, we verify that Ofe = fc + X^^^sO ^ 
2)ak+i-j + (fc — l)ao. Note that, by the "base case" part of Lemma[21 aj = 2-' 
in the relevant range, except that = 2'^ — 1. Therefore, 

fc fc 
fc + ^(j - 2)ak+i-3 + (fc - l)ao = k + ^(j - 2)2"+i-^' + fc - 1 

j=3 3=3 

fc-2 

= ^j2''-^-^ +2k~l 
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k-2 

= 2''-2(4-fc2-'=+3) + 2fc-l 
= 2'' - 2/c + 2fc - 1 
^2'' -l=ak. 

Now, suppose the statement holds for n. Applying the first recurrence in 
Lemma [21 

fc 

Qji — ^ ^ O^n—j 

k 

= CLn-1 + an-j 

A: n — k—1 k 

j=3 j=0 3=2 

k n—k—1 

= k + Y.^] ~ \)ar,-j + (k ^ \) 

k n—k—1 

= A: + ^(j - 2)a„+i_j + (fc - f )a„_fc + (fc - 1) ^ 

k n — k 

= k + Y.^j- 2)a„+i_j + (fc ~ f ) ^ , 



a 



j=3 3=0 



□ 



Corollary 4. bn < for all n - 1 > k > 3. 



Proof. If we combine the recurrence for bn from Lemma[2]with the above Lemma 

m 



= -a„_i + ^(j - 2)a„_j + 6n 



J=3 3=1 

7i — k—l k 

= -fc - (fc - 1) ^ a,+Y bn-j < 0, (1) 

3=0 j=l 

by induction. □ 



Let pk be the largest (in absolute value) root of the polynomial g{z) = 
^k+i _ _|_ \t\s proven in [8] that pk is real, lies between 5/3 and 2, and is 
unique in these respects. It is also shown in that ^ 2 as fc oo. Note 
that 



so that Pk is a root of the left-hand polynomial f{z) here as well. Since f{z) 
is the characteristic polynomial for the recurrence that the a„ satisfy, pt is the 
growth rate of the a„, i.e., lim„^oo loga„/ri = pk- 

Lemma 5. For all n > 1, a„ > pkCLn-i- 

Proof. Since pk < 2, and a„ = 2" for < n < fc, the claimed bound holds for n 
in this range. Suppose it holds for all n < A^. Then by Lemma [2l 




k-l 



.k+l 



k 



k 




k 




= PkO-n-l- 



□ 



Lemma 6. For k > A and all n > k, 6„ > — 2fca„/3. 
Proof. By 



n—k—1 



k 




If wc suppose that bj > ^^kaj for all j < n, then 



n — /c — 1 



k 




7i — k—l 




By iterating Lemma we have 



n — k—l 
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= -a„fc f — + — +7 

> -ank f— + ^p-"+^ 

We may begin by taking 7 = k{2i^^^-3) - Tie considering a^+i ^ 2^+^ - 3 
and = 1 — 2k. Then, 7 increases by at most 

n— ^ ^ n— A:+l /^A; ^ n— 



n— /c+1 n—k-\-l 



2fe - 1 



n]Pk 2^ Pk 



n=0 

Pk' 5 \ 1 



< 



2'^-! 2pt+'J 
3 5 \ 5 293 



5-15 2(5/3)V 2 500 



The conclusion follows for all n > k + 1, since |ii + ttt = < i • It is also 

— ' 500 116 3625 — 3 

easy to verify that bi^ > —2kak/i- □ 



3 Main Result 

Here we prove Theorem [T] stated in the introduction. 
Proposition 7. For 4 < k < n and n prime, 

k sk(4) 

- - 2 < ---^ < 2fc - 3. 

3 ak+i[n~k-2) 

Proof. The set 5*^ contains each sequence of the form O'^lw where w is a 0'"'-free 
word of length n — k — 1. However, the quantity sk(S'fc) is not quite the sum of 
the skews of all O'^-free sequences of length n — k — 1 prefixed by O'^l: it must 
include all elements of S'^, not just those that have prefix 0*^ and contain no 
other runs 0*^ . For each word w of length n which contains more than one run 
of the form O'^, but no runs of the form 0*^+^, only one of its conjugates (namely, 
w°) appears in S'^. Define run(i(;) to be the maximum k so that O'' £ w, and 
let pk{w) be the number of subwords of the form 0*^ in w, where run(w) = k. 
(Set pk(w) = otherwise.) Since we may assume that each w is aperiodic, this 
means that 

sk(4) = Yl ^^(^) 
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= ^ l{w = w°)sk{w) 

toG{0,1}" 
run(t(;)— /c 

_ sk(0'=lwl) 

Pfc(t0)=t 

Define the "run-print" rp(u') of a word w G {0, 1} with run(u') = fc to be the 
set of indices j G [n] so that w has a run 0*"' starting at index j. Then we may 
write 

t>0 «.G{0,1}""'°"^ 

t>o «.e{o,i}"~'="^ 

Pfc(-!i)) = t 

+E^ E E «kH. 

t>0 ^g(-[„-^k-21^^g{o^ij„-*=-2 

rp(iij)— 5 

Now, for a given of cardinahty t and w with rp(w) = S, there is a O'" run 
starting at location s for each s e S. Each such run is bounded on both sides by 
a 1. In between the runs are intervals, the sum over whose skews is nonpositive, 
by Corollary m Therefore, 

rp{w) — S rp(tw) — 5 

SO we have 



sk(4)<E^ E 

t>o welQ.i}"-"-'' 

Pk{w)=t 



E^ E E ^(^-1) 

rp(itj) — 5 



<E E (^-2) + (fc-i)E E E 1 

Pk{w)=t rp(u))=S 

= (fc - 2)afe+i(?7, -k-2) + {k- l)afc+i(n - /c - 2) 
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= {2k-3)ak+i{n- k-2). 
On the other hand, by Lemma [51 



t>o 

Pfc (■"')=* 



> 



^ (fc-2)+ 5^ skH 



= (fc-2)afc+i(n-A:-2) + /3fc+i(n-/c-2) 
> (fc/3-2)afc+i(n-fc-2). 



□ 



In the proof of Theorem [T] below, we use the following useful inequality of 
Janson (see, for example, [5]). The lower bound is standard; the upper bound 
is an easy modification of the one presented in [1] . Let X be a finite set and let 
P be a random subset of X, with elements x G X chosen independently with 
probability Px- Let {Zi : i G T} he a, system of subsets of X, and let Ai denote 
the event that Zi C P. If Zi D Zj ~ 0, then Ai and Aj are independent. Let 

A = Y,P{A,AA,), 

where the sum is taken over all ordered pairs i j with Zi O Zj 7^ 0. Finally, 
de&ne n = J2^PiA,). 

Lemma 8. With fjL, A as above, if A > fi/2, then 

Proof of Theorem]^ Suppose for the moment that n is prime and fc > 4. We 
know that 

disc(£„) = max(l + sk(£„_j) + disc(4i-fc))- 

k ^ — ^ 

From Proposition [71 we have that 

n n 

J2 sk(4)> (fc/3-2)-afc+i(n-fc-2) 

fc— log n+1 /c— log n+1 



> J2 (fc/3-2)-2"-'^-i(l-n2 

fc— log n+1 

^,'2"logn 
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On the other hand, for any 

n — 1 n — 2 



^ sk(4) < ^(2fc - 3) • ak+i{n -k-2) 

k=0 

< ^ 2fc • ak{n - fc - 1). 



fc=t fc=0 
ri-l 



fc=l 

We estimate this quantity using the inequaUty of Janson stated above. In 
this case, we take X = [n], P is the set of indices where a appears, px — 1/2 
for every x, X = [n — k + 1], Zi = + k — 1] (i.e., the i^^ length k interval of 
[n]), and Ai is the event that a length n word has a subsequence of the form 0*^ 
on some Zi. Then 

^i ^ {n - k + 1)2^'' 



and 



l<i,j<n~k+l 
0<\i-j\<k 

oo 

< 2-''+\n - fc + 1) ^ 2"" = 2-''+\n - k + 1) = 2fi. 

Furthermore, 

fc-i 

A > 2-'^(n- fc + 1)^2-* > 2~'^'~^(n- fc + 1) n/2, 

s=l 

so the hypotheses hold. Therefore, for a uniform random choice of w G {0, 1}", 

P{0'' ^w)< e-'^/i2 = e-("-^+i)/(i2-2'=)_ 
Applying this bound to the above computations, 

kak{n - fc - 1) < fc • 2«-fce-("-2fe)/(i2.2'=)^ 
Let T = [\ogn\. Then 

n—1 n—1 

J2 kakin - fc - 1) < ^ fc • 2«-^e-("-2fc)/(i2.2'=) 

2 log n 



= 2" ^ fc-2 

fc=i 

ra-l 

+ 2" ^ fc-2 



fcg-(?i-2fe)/(12-2'=) 



-fcg-(n-2fe)/(12-2'=) 



fc=2 logn+1 
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A;— — oo 

oo 



2" logn 



fc— — C 

OO 



2" logn 



<2" y ^.2^,-2V48 + /2"logn 



n 

k—~'Oo 



A:— — oo 

Therefore, the total discrepancy is 8(2"logn/n). 

There are two more terms to consider: sk(£fc) with fc < 3, and max^ disc(^„_fc). 
The former terms are bomided by 0{p!l) = 0(1. 93*^), and therefore make an in- 
significant contribution. As for the latter, the length of in-k is bounded above by 
afc+i(n — fc — 2), and the above analysis shows that this quantity is o(2" logn/n). 
Since the length of is an upper bound for disc(£n-fe), this term also does 
not affect the order of disc(£„). 

Finally, we may drop the assumption that n is prime. If not, then the 
above analysis is wrong: some words of length n, which would be part of the 
concatenation that gives rise to an £k, arc in fact periodic, and therefore only 
appear as their minimal roots in £„. (All Lyndon words of length dividing n 
arise in this way.) However, the total number of symbols they contribute is at 
most 

d\n,d<n 

Hence, the asymptotic bound holds. □ 
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